20 research outputs found

    A Bit-Parallel Deterministic Stochastic Multiplier

    Full text link
    This paper presents a novel bit-parallel deterministic stochastic multiplier, which improves the area-energy-latency product by up to 10.6×\times104^4, while improving the computational error by 32.2\%, compared to three prior stochastic multipliers.Comment: To Appear at IEEE ISQED 202

    Photonic Reconfigurable Accelerators for Efficient Inference of CNNs with Mixed-Sized Tensors

    Full text link
    Photonic Microring Resonator (MRR) based hardware accelerators have been shown to provide disruptive speedup and energy-efficiency improvements for processing deep Convolutional Neural Networks (CNNs). However, previous MRR-based CNN accelerators fail to provide efficient adaptability for CNNs with mixed-sized tensors. One example of such CNNs is depthwise separable CNNs. Performing inferences of CNNs with mixed-sized tensors on such inflexible accelerators often leads to low hardware utilization, which diminishes the achievable performance and energy efficiency from the accelerators. In this paper, we present a novel way of introducing reconfigurability in the MRR-based CNN accelerators, to enable dynamic maximization of the size compatibility between the accelerator hardware components and the CNN tensors that are processed using the hardware components. We classify the state-of-the-art MRR-based CNN accelerators from prior works into two categories, based on the layout and relative placements of the utilized hardware components in the accelerators. We then use our method to introduce reconfigurability in accelerators from these two classes, to consequently improve their parallelism, the flexibility of efficiently mapping tensors of different sizes, speed, and overall energy efficiency. We evaluate our reconfigurable accelerators against three prior works for the area proportionate outlook (equal hardware area for all accelerators). Our evaluation for the inference of four modern CNNs indicates that our designed reconfigurable CNN accelerators provide improvements of up to 1.8x in Frames-Per-Second (FPS) and up to 1.5x in FPS/W, compared to an MRR-based accelerator from prior work.Comment: Paper accepted at CASES (ESWEEK) 202

    A Silicon Nitride Microring Based High-Speed, Tuning-Efficient, Electro-Refractive Modulator

    Full text link
    The use of the Silicon-on-Insulator (SOI) platform has been prominent for realizing CMOS-compatible, high-performance photonic integrated circuits (PICs). But in recent years, the silicon-nitride-on-silicon-dioxide (SiN-on-SiO2_2) platform has garnered increasing interest as an alternative to the SOI platform for realizing high-performance PICs. This is because of its several beneficial properties over the SOI platform, such as low optical losses, high thermo-optic stability, broader wavelength transparency range, and high tolerance to fabrication-process variations. However, SiN-on-SiO2_2 based active devices such as modulators are scarce and lack in desired performance, due to the absence of free-carrier based activity in the SiN material and the complexity of integrating other active materials with SiN-on-SiO2_2 platform. This shortcoming hinders the SiN-on-SiO2_2 platform for realizing active PICs. To address this shortcoming, we demonstrate a SiN-on-SiO2_2 microring resonator (MRR) based active modulator in this article. Our designed MRR modulator employs an Indium-Tin-Oxide (ITO)-SiN-ITO thin-film stack, in which the ITO thin films act as the upper and lower claddings of the SiN MRR. The ITO-SiN-ITO thin-film stack leverages the free-carrier assisted, high-amplitude refractive index change in the ITO films to effect a large electro-refractive optical modulation in the device. Based on the electrostatic, transient, and finite difference time domain (FDTD) simulations, conducted using photonics foundry-validated tools, we show that our modulator achieves 280 pm/V resonance modulation efficiency, 67.8 GHz 3-dB modulation bandwidth, ∼\sim19 nm free-spectral range (FSR), ∼\sim0.23 dB insertion loss, and 10.31 dB extinction ratio for optical on-off-keying (OOK) modulation at 30 Gb/s

    AGNI: In-Situ, Iso-Latency Stochastic-to-Binary Number Conversion for In-DRAM Deep Learning

    Full text link
    Recent years have seen a rapid increase in research activity in the field of DRAM-based Processing-In-Memory (PIM) accelerators, where the analog computing capability of DRAM is employed by minimally changing the inherent structure of DRAM peripherals to accelerate various data-centric applications. Several DRAM-based PIM accelerators for Convolutional Neural Networks (CNNs) have also been reported. Among these, the accelerators leveraging in-DRAM stochastic arithmetic have shown manifold improvements in processing latency and throughput, due to the ability of stochastic arithmetic to convert multiplications into simple bit-wise logical AND operations. However,the use of in-DRAM stochastic arithmetic for CNN acceleration requires frequent stochastic to binary number conversions. For that, prior works employ full adder-based or serial counter based in-DRAM circuits. These circuits consume large area and incur long latency. Their in-DRAM implementations also require heavy modifications in DRAM peripherals, which significantly diminishes the benefits of using stochastic arithmetic in these accelerators. To address these shortcomings, this paper presents a new substrate for in-DRAM stochastic-to-binary number conversion called AGNI. AGNI makes minor modifications in DRAM peripherals using pass transistors, capacitors, encoders, and charge pumps, and re-purposes the sense amplifiers as voltage comparators, to enable in-situ binary conversion of input statistic operands of different sizes with iso latency.Comment: (Preprint) To Appear at ISQED 202

    SCONNA: A Stochastic Computing Based Optical Accelerator for Ultra-Fast, Energy-Efficient Inference of Integer-Quantized CNNs

    Full text link
    The acceleration of a CNN inference task uses convolution operations that are typically transformed into vector-dot-product (VDP) operations. Several photonic microring resonators (MRRs) based hardware architectures have been proposed to accelerate integer-quantized CNNs with remarkably higher throughput and energy efficiency compared to their electronic counterparts. However, the existing photonic MRR-based analog accelerators exhibit a very strong trade-off between the achievable input/weight precision and VDP operation size, which severely restricts their achievable VDP operation size for the quantized input/weight precision of 4 bits and higher. The restricted VDP operation size ultimately suppresses computing throughput to severely diminish the achievable performance benefits. To address this shortcoming, we for the first time present a merger of stochastic computing and MRR-based CNN accelerators. To leverage the innate precision flexibility of stochastic computing, we invent an MRR-based optical stochastic multiplier (OSM). We employ multiple OSMs in a cascaded manner using dense wavelength division multiplexing, to forge a novel Stochastic Computing based Optical Neural Network Accelerator (SCONNA). SCONNA achieves significantly high throughput and energy efficiency for accelerating inferences of high-precision quantized CNNs. Our evaluation for the inference of four modern CNNs at 8-bit input/weight precision indicates that SCONNA provides improvements of up to 66.5x, 90x, and 91x in frames-per-second (FPS), FPS/W and FPS/W/mm2, respectively, on average over two photonic MRR-based analog CNN accelerators from prior work, with Top-1 accuracy drop of only up to 0.4% for large CNNs and up to 1.5% for small CNNs. We developed a transaction-level, event-driven python-based simulator for the evaluation of SCONNA and other accelerators (https://github.com/uky-UCAT/SC_ONN_SIM.git).Comment: To Appear at IPDPS 202
    corecore